Search CORE

17 research outputs found

Het-node2vec: second order random walk sampling for heterogeneous multigraphs embedding

Author: Cappelletti Luca
Casiraghi Elena
Fontana Tommaso
Ravanmehr Vida
Reese Justin
Robinson Peter
Valentini Giorgio
Publication venue
Publication date: 05/01/2021
Field of study

We introduce a set of algorithms (Het-node2vec) that extend the original node2vec node-neighborhood sampling method to heterogeneous multigraphs, i.e. networks characterized by multiple types of nodes and edges. The resulting random walk samples capture both the structural characteristics of the graph and the semantics of the different types of nodes and edges. The proposed algorithms can focus their attention on specific node or edge types, allowing accurate representations also for underrepresented types of nodes/edges that are of interest for the prediction problem under investigation. These rich and well-focused representations can boost unsupervised and supervised learning on heterogeneous graphs.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

GraPE: fast and scalable Graph Processing and Embedding

Author: Callahan Tiffany J.
Cappelletti Luca
Casiraghi Elena
Fontana Tommaso
Joachimiak Marcin P.
Mungall Christopher J.
Ravanmehr Vida
Reese Justin
Robinson Peter N.
Valentini Giorgio
Publication venue
Publication date: 12/10/2021
Field of study

Graph Representation Learning methods have enabled a wide range of learning problems to be addressed for data that can be represented in graph form. Nevertheless, several real world problems in economy, biology, medicine and other fields raised relevant scaling problems with existing methods and their software implementation, due to the size of real world graphs characterized by millions of nodes and billions of edges. We present GraPE, a software resource for graph processing and random walk based embedding, that can scale with large and high-degree graphs and significantly speed up-computation. GraPE comprises specialized data structures, algorithms, and a fast parallel implementation that displays everal orders of magnitude improvement in empirical space and time complexity compared to state of the art software resources, with a corresponding boost in the performance of machine learning methods for edge and node label prediction and for the unsupervised analysis of graphs.GraPE is designed to run on laptop and desktop computers, as well as on high performance computing cluster

arXiv.org e-Print Archive

GRAPE for fast and scalable graph processing and random-walk-based embedding

Author: Callahan Tiffany J
Cano Carlos
Cappelletti Luca
Casiraghi Elena
Fontana Tommaso
Joachimiak Marcin P
Mungall Christopher J
Ravanmehr Vida
Reese Justin
Robinson Peter N
Valentini Giorgio
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/07/2023
Field of study

Graph representation learning methods opened new avenues for addressing complex, real-world problems represented by graphs. However, many graphs used in these applications comprise millions of nodes and billions of edges and are beyond the capabilities of current methods and software implementations. We present GRAPE (Graph Representation Learning, Prediction and Evaluation), a software resource for graph processing and embedding that is able to scale with big graphs by using specialized and smart data structures, algorithms, and a fast parallel implementation of random-walk-based methods. Compared with state-of-the-art software resources, GRAPE shows an improvement of orders of magnitude in empirical space and time complexity, as well as competitive edge- and node-label prediction performance. GRAPE comprises approximately 1.7 million well-documented lines of Python and Rust code and provides 69 node-embedding methods, 25 inference models, a collection of efficient graph-processing utilities, and over 80,000 graphs from the literature and other sources. Standardized interfaces allow a seamless integration of third- party libraries, while ready-to-use and modular pipelines permit an easy-to- use evaluation of graph-representation-learning methods, therefore also positioning GRAPE as a software resource that performs a fair comparison between methods and libraries for graph processing and embedding

The Jackson Laboratory: The Mouseion at the JAXlibrary

Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer.

Author: Blau Hannah
Bocci Giovanni
Bult Carol J
Cappelletti Luca
Carmody Leigh
Casiraghi Elena
Coleman Ben D
Fontana Tommaso
George Joshy
Hansen Peter
Joachimiak Marcin
Mungall Christopher
Oprea Tudor I
Ravanmehr Vida
Reese Justin
Robinson Peter N
Rueter Jens
Valentini Giorgio
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/12/2021
Field of study

Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of \u3e530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy

The Jackson Laboratory: The Mouseion at the JAXlibrary

eScholarship - University of California

KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response.

Author: Balhoff James P
Blau Hannah
Callahan Tiffany J
Cappelletti Luca
Carbon Seth
Fontana Tommaso
Good Benjamin M
Haendel Melissa A
Harris Nomi L
Joachimiak Marcin P
Matentzoglu Nicolas
Mungall Christopher J
Munoz-Torres Monica C
Ravanmehr Vida
Reese Justin T
Robinson Peter N
Shefchek Kent A
Unni Deepak
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2021
Field of study

Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics

The Jackson Laboratory: The Mouseion at the JAXlibrary

eScholarship - University of California

KG-Hub-building and exchanging biological knowledge graphs.

Author: Balhoff Jim
Bruskiewich Richard M
Callahan Tiffany J
Cappelletti Luca
Carbon Seth
Caufield J Harry
Chan Lauren E
Cortes Katherina
Elsarboukh Glass
Fontana Tommaso
Haendel Melissa A
Harris Nomi L
Hegde Harshad
Joachimiak Marcin P
Matentzoglu Nicolas
Moxon Sierra A T
Mungall Christopher J
Munoz-Torres Monica C
Putman Tim
Ravanmehr Vida
Reese Justin T
Robinson Peter N
Schaper Kevin
Shefchek Kent A
Thessen Anne E
Unni Deepak R
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/07/2023
Field of study

MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org

The Jackson Laboratory: The Mouseion at the JAXlibrary

Semantic integration of clinical laboratory tests from electronic health records for deep phenotyping and biomarker discovery.

Author: Bennett Tellen D
Callahan Tiffany J
Carmody Leigh
Champion James
Chute Christopher G
Danis Daniel
Fecho Karamarie
Feinstein James A
Gourdine J P
Haendel Melissa A
Hunter Lawrence E
Joachimiak Marcin P
Köhler Sebastian
Martin Blake
McDonald Clement J
Mungall Christopher J
Peden David B
Pfaff Emily R
Ramsdill Justin
Ravanmehr Vida
Robasky Kimberly
Robinson Peter N
Stefanski Adrianne L
Vasilevsky Nicole
Vreeman Daniel J
Walton Nephi A
Xu Hao
Yates Amy
Zhang Xingmin Aaron
Zhu Richard L
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2019
Field of study

Electronic Health Record (EHR) systems typically define laboratory test results using the Laboratory Observation Identifier Names and Codes (LOINC) and can transmit them using Fast Healthcare Interoperability Resource (FHIR) standards. LOINC has not yet been semantically integrated with computational resources for phenotype analysis. Here, we provide a method for mapping LOINC-encoded laboratory test results transmitted in FHIR standards to Human Phenotype Ontology (HPO) terms. We annotated the medical implications of 2923 commonly used laboratory tests with HPO terms. Using these annotations, our software assesses laboratory test results and converts each result into an HPO term. We validated our approach with EHR data from 15,681 patients with respiratory complaints and identified known biomarkers for asthma. Finally, we provide a freely available SMART on FHIR application that can be used within EHR systems. Our approach allows readily available laboratory tests in EHR to be reused for deep phenotyping and exploits the hierarchical structure of HPO to integrate distinct tests that have comparable medical interpretations for association studies

The Jackson Laboratory: The Mouseion at the JAXlibrary

IUPUIScholarWorks

eScholarship - University of California

The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Author: Babb Larry
Balhoff James P
Bello Susan M
Blau Hannah
Bradford Yvonne
Brush Matthew
Carbon Seth
Carmody Leigh
Chan Lauren E
Cipriani Valentina
Conlin Tom
Cuzick Alayne
Dunn Nathan
Essaid Shahim
Fey Petra
Gargano Michael
Gourdine Jean-Phillipe
Grove Chris
Groza Tudor
Haendel Melissa A
Hamosh Ada
Harris Midori
Harris Nomi L
Helbig Ingo
Hoatlin Maureen
Jacobsen Julius O B
Joachimiak Marcin
Jupp Simon
Keith Daniel
Köhler Sebastian
Lett Kenneth B
Lewis Suzanna E
Matentzoglu Nicolas
McMurry Julie A
McNamara Craig
Mungall Christopher J
Munoz-Torres Monica C
Osumi-Sutherland David
Pendlington Zoë M
Pilgrim Clare
Putman Tim
Ravanmehr Vida
Reese Justin
Riggs Erin
Robb Sofia
Robinson Peter N
Rocca Maria D
Roncaglia Paola
Seager James
Segerdell Erik
Shefchek Kent A
Similuk Morgan
Smedley Damian
Storm Andrea L
Thaxon Courtney
Thessen Anne
Unni Deepak
Vasilevsky Nicole
Zhang Xingmin Aaron
Publication venue: The Mouseion at the JAXlibrary
Publication date: 01/01/2020
Field of study

In biology and biomedicine, relating phenotypic outcomes with genetic variation and environmental factors remains a challenge: patient phenotypes may not match known diseases, candidate variants may be in genes that haven\u27t been characterized, research organisms may not recapitulate human or veterinary diseases, environmental factors affecting disease outcomes are unknown or undocumented, and many resources must be queried to find potentially significant phenotypic associations. The Monarch Initiative (https://monarchinitiative.org) integrates information on genes, variants, genotypes, phenotypes and diseases in a variety of species, and allows powerful ontology-based search. We develop many widely adopted ontologies that together enable sophisticated computational analysis, mechanistic discovery and diagnostics of Mendelian diseases. Our algorithms and tools are widely used to identify animal models of human disease through phenotypic similarity, for differential diagnostics and to facilitate translational research. Launched in 2015, Monarch has grown with regards to data (new organisms, more sources, better modeling); new API and standards; ontologies (new Mondo unified disease ontology, improvements to ontologies such as HPO and uPheno); user interface (a redesigned website); and community development. Monarch data, algorithms and tools are being used and extended by resources such as GA4GH and NCATS Translator, among others, to aid mechanistic discovery and diagnostics

The Jackson Laboratory: The Mouseion at the JAXlibrary

eScholarship - University of California

Recommended from our members

Codes on Graphs and Analysis of Iterative Algorithms for Reconstructing Sparse Signals and Decoding of Check-Hybrid GLDPC Codes

Author: Ravanmehr Vida
Ravanmehr Vida
Publication venue: The University of Arizona.
Publication date: 01/01/2015
Field of study

The necessity for fast and efficient algorithms in different fields of communications and signal processing have led to developing low-complexity iterative algorithms. In the fields of compressed sensing and channel coding, which are the main focus of this dissertation, designing low-complexity iterative algorithms with excellent performance has been of interest for many years. Recently, there has been a significant interest to understanding the failures of the iterative reconstruction and decoding algorithms. Knowing the failures of the algorithms may improve the performance either by designing new algorithms or by providing new conditions on the input of the algorithms under which the previous failures of algorithms are disabled. In the first part of this dissertation, we consider an iterative reconstruction algorithm called the interval-passing algorithm (IPA) which was originally introduced to reconstruct non-negative signals from binary measurement matrices. We first modify the IPA to reconstruct signals from non-negative measurement matrices and compare the performance of the IPA with two reconstructing algorithms, the verification algorithm and the linear programming technique for recovery of signals. The results show that the IPA is a good trade-off between a very simple verification algorithm and the complex linear programming technique. We also show the failures of the IPA on some subgraphs in the Tanner graph corresponding to the measurement matrix called stopping sets and analyze the failures and successes of the IPA on subsets of stopping sets. We provide sufficient conditions under which the IPA succeeds the recovery of the signal. Reconstruction performance of the IPA using different LDPC measurement matrices is given to show the effect of stopping sets. In the second part of the dissertation, a method for constructing a class of codes called the check-hybrid generalized LDPC (CH-GLDPC) is provided. In CH-GLDPC codes, some single parity-checks are replaced by super checks corresponding to the shorter and stronger error correcting codes. However, the main feature of our method is to carefully replacing super checks such that harmful structures in the Tanner graph of the LDPC codes called trapping sets are eliminated. The second purpose is to reduce the rate-loss caused by replacing super checks through finding the minimum number of super checks needed for eliminating a certain trapping set. To construct these codes, we first use the knowledge of trapping sets of LDPC codes over the binary symmetric channel (BSC) and systematically replace super checks to disable a trapping set. We then provide upper bounds on the minimum number of super checks needed to eliminate all trapping sets of a certain size in the Tanner graph of an LDPC code. The guaranteed error correction capability of the CH-GLDPC codes is also studied. The results are extended to different classes of LDPC codes and iterative decoders

The University of Arizona

ChIPWig - A state of the art compression method for ChIP-seq data

Author: Olgica Milenkovic
Vida Ravanmehr
Zhiying Wang
Publication venue
Publication date
Field of study

Poster presentation for the BD2K All Hands Meeting at DC, November 29-30th, 2016. Results generated as part of the BD2K Targeted Software Development Program award to UIUC

ZENODO

FigShare